Skip to content

fix: harden phase completion diagnostics#64

Merged
pruiz merged 4 commits into
masterfrom
fix/phase2-completion-diagnostics
Jun 20, 2026
Merged

fix: harden phase completion diagnostics#64
pruiz merged 4 commits into
masterfrom
fix/phase2-completion-diagnostics

Conversation

@pruiz

@pruiz pruiz commented Jun 17, 2026

Copy link
Copy Markdown
Owner

Summary

  • preserve and print all Phase 2 completion gate failures instead of truncating diagnostics
  • make terminal stop artifact failures auto-resume with exact repair details and final failure output
  • add timestamped phase transcript names and stricter Phase 2 finding quality checks
  • allow Phase 3 to complete as a no-op when no PENDING findings exist

Tests

  • rtk pytest -q tests/test_findings_quality.py tests/test_phases_completion.py tests/test_phase_failure_state_reset.py tests/test_event_recording.py tests/test_phase_artifacts_cli.py
  • rtk pytest -q tests/test_render_settings_propagation.py tests/test_phase_1_mid_turn_forgiveness.py tests/test_codecome_runner.py tests/test_event_recording.py
  • make tests

Summary by CodeRabbit

  • New Features

    • Phase 3 now completes successfully when no findings are pending
    • Phase 2 quality validation checks findings for completeness and accuracy
  • Improvements

    • Enhanced auto-resume behavior for incomplete phases and output budget exhaustion
    • Aggregate failure reporting displays all validation errors at once
    • Clearer validation guidance in phase completion messages
  • Tests

    • Expanded coverage for phase completion gating, artifact validation, and failure recovery scenarios

@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 67116344-e59d-47df-bb62-646a4e76bd14

📥 Commits

Reviewing files that changed from the base of the PR and between bea6ace and bbd49a2.

📒 Files selected for processing (3)
  • tools/codecome/harness.py
  • tools/codecome/phase_1.py
  • tools/phases/completion.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • tools/codecome/phase_1.py
  • tools/phases/completion.py
  • tools/codecome/harness.py

📝 Walkthrough

Walkthrough

Introduces tools/findings/quality.py for Phase 2 finding quality validation and wires it into artifact_checks.py and completion.py. Refactors finish-reason classification to distinguish budget exhaustion (_FINISH_BUDGET) from hard failures, extending that logic through harness and Phase 1 orchestration. Adds a Phase 3 no-pending-findings fast-path in the harness and gate. Tightens transcript filename uniqueness with timestamp/PID. Extends test coverage for all changes.

Changes

Phase 2 Quality Validation and Integration

Layer / File(s) Summary
Phase 2 finding quality validation module
tools/findings/quality.py
Defines required section lists, placeholder/template/non-finding marker constants, field-population predicates, validate_phase2_finding_quality (frontmatter + section checks), and phase2_summary_declares_no_findings.
Tests for Phase 2 finding quality
tests/test_findings_quality.py
Adds sys.path setup, _write_phase2_finding helper, and two regression tests covering rejection of test/template artifacts and case-insensitive template-marker detection.
Phase 2 artifact checker and dispatcher registration
tools/phases/artifact_checks.py
Adds check_phase_2_artifacts() validating run-summary presence and finding completeness/quality, registers it in _CHECKERS via _check_phase_2_with_ct(), and includes Phase 2 in phase="all" sweep.
Tests for Phase 2 artifact checks
tests/test_phase_artifacts_cli.py
Verifies check_phase_2_artifacts() accepts explicit "None." no-findings summaries, rejects stub findings, and aggregates all quality errors without truncation.
Phase 2 freshness helpers, completion gating, and resume prompts
tools/phases/completion.py
Adds _fresh_run_summaries(), _fresh_pending_findings(), _latest_summary_declares_no_phase2_findings(); rewrites Phase 2 gate to require fresh summaries and validate findings; updates _resume_opener_for_reason() for budget finish; changes resume prompt wording to "Fix every listed item" and appends validation-completion strictness guidance.
Tests for Phase 2 completion gating and resume prompts
tests/test_phases_completion.py
Covers accepting structured "None." no-findings summaries, rejecting vague summaries, rejecting stub findings, aggregating quality errors, verifying updated resume-prompt wording, and sweep fixture alignment.

Budget Finish Reason Handling and Phase 3 No-Op Fast-Path

Layer / File(s) Summary
Finish reason classification infrastructure refactoring
tools/rendering/events/base.py, tools/rendering/events/__init__.py
Splits _FINISH_FAILURE into _FINISH_BUDGET (length, max_tokens) and _FINISH_HARD_FAILURE (content-filter, error), keeps _FINISH_FAILURE as a legacy union alias, and re-exports both new constants from the package.
Budget exhaustion handling in harness orchestration
tools/codecome/harness.py
Imports _FINISH_BUDGET; resets finish_warning at loop start; generates specialized budget-exhaustion warnings; distinguishes mid-turn vs budget in graceful-completion decision; adds explicit break on returncode == 0; specializes auto-resume messaging; generalizes session-ID error wording; prints "remaining gate failures" list when non-empty.
Budget exhaustion handling in Phase 1 subphase orchestration
tools/codecome/phase_1.py
Imports _FINISH_BUDGET; adds finish-warning branch for budget exhaustion; extends graceful-completion logic for mid-turn vs budget; updates auto-resume trigger for both finish reasons; generalizes session-ID resume error wording.
Phase 3 no-pending-findings fast-path and gate behavior
tools/codecome/harness.py, tools/phases/gates.py
Adds datetime import, helpers to count pending Phase 3 findings and write timestamped noop summaries, and early-return fast-path when count is zero; changes gate_phase_3 to return 0 with noop message instead of failing when no findings are pending.
Tests for Phase 3 gate, harness failure reporting, and budget auto-resume
tests/test_gate_check.py, tests/test_phase_failure_state_reset.py
Verifies Phase 3 gate succeeds with "nothing to review" when no pending findings exist; tests terminal-stop auto-resume on missing artifacts, gate-failure output in stdout, and budget-exhaustion auto-resume scenarios.

Transcript Filename Uniqueness

Layer / File(s) Summary
Transcript filename uniqueness with timestamp and PID
tools/codecome/transcript.py, tests/test_event_recording.py
Transcript.for_phase() computes timestamp/PID and routes through _unique_transcript_path(); test assertion tightened to re.fullmatch validating the full timestamp/pid pattern.

Sequence Diagram(s)

sequenceDiagram
  participant HarnessLoop as run_phase_mode
  participant FinishReason as FinishReason classifier
  participant GracefulCheck as check_phase_graceful_completion
  participant Quality as validate_phase2_finding_quality
  participant ResumePath as build_phase_resume_prompt

  HarnessLoop->>HarnessLoop: reset finish_warning = None
  HarnessLoop->>FinishReason: evaluate last_finish_reason
  FinishReason-->>HarnessLoop: _FINISH_BUDGET
  HarnessLoop->>HarnessLoop: set finish_warning (budget exhaustion)
  HarnessLoop->>GracefulCheck: check_phase_graceful_completion(phase=2)
  GracefulCheck->>Quality: validate_phase2_finding_quality(finding_path)
  Quality-->>GracefulCheck: error list
  alt quality errors present
    GracefulCheck-->>HarnessLoop: (False, ["Invalid: ..."])
    HarnessLoop->>HarnessLoop: set returncode = 2
    HarnessLoop->>ResumePath: build_phase_resume_prompt(failure_details=errors)
    ResumePath-->>HarnessLoop: resume prompt with "Fix every listed item."
  else no errors
    GracefulCheck-->>HarnessLoop: (True, [])
    HarnessLoop->>HarnessLoop: clear finish_warning, break loop
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • pruiz/CodeCome#43: Modifies the same tools/codecome/harness.py and tools/codecome/phase_1.py retry/resume control-flow paths that this PR extends with _FINISH_BUDGET handling.
  • pruiz/CodeCome#46: Directly modifies tools/phases/completion.py gating, build_phase_resume_prompt(failure_details=...), and the harness/phase_1 failure-detail unpacking that this PR further updates with budget-exhaustion resume messaging and stricter validation guidance.

Poem

🐇 Hoppity-hop through the phases I go,
No stub findings allowed—quality must show!
Budget exhausted? I'll warn and resume,
Phase 3 with no pending? A no-op in bloom.
Each transcript now stamped with the time and my PID,
No file gets truncated—the rabbit forbids! 🕐

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.19% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'fix: harden phase completion diagnostics' directly aligns with the main objective of improving error handling, diagnostic visibility, and completion gate enforcement across phases.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/phase2-completion-diagnostics

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown

Coverage Report

Metric Value
Line Coverage 76.4%
Lines Covered 0 / 0

Download detailed HTML coverage reports per OS/Python from the workflow artifacts.

Generated by pytest-cov on 2026-06-19T17:14:16.884Z

@greptile-apps

greptile-apps Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR hardens Phase 2 completion diagnostics by splitting budget-exhaustion finish reasons (length, max_tokens) from hard-failure reasons into a dedicated _FINISH_BUDGET set, wiring auto-resume for budget-exhaustion paths, adding Phase 2 finding quality validation (tools/findings/quality.py), and allowing Phase 3 to short-circuit as a no-op when no PENDING findings exist.

  • _FINISH_BUDGET / _FINISH_HARD_FAILURE split (base.py): _FINISH_FAILURE is retained as a union alias for backward compatibility; both harness and phase_1 now distinguish budget exhaustion (retriable) from hard failures (non-retriable).
  • Phase 2 quality gate (quality.py, completion.py, artifact_checks.py): New validate_phase2_finding_quality enforces populated frontmatter fields, non-placeholder section bodies, and template-marker detection; completion gate now validates fresh findings and requires explicit "no findings" wording in summaries.
  • Phase 3 no-op (harness.py, gates.py): Zero PENDING findings causes both the gate and the harness to exit 0 with a synthetic summary rather than treating it as an error.

Confidence Score: 5/5

Safe to merge — all changed paths are well-tested and the core logic is sound.

The finish-reason reclassification keeps _FINISH_FAILURE as an exact union alias so no existing call site changes behaviour. The new Phase 2 quality gate, no-op Phase 3 path, and gate-failure printing are each covered by targeted regression tests. No data-loss or incorrect-state paths were identified.

No files require special attention.

Important Files Changed

Filename Overview
tools/rendering/events/base.py Splits _FINISH_FAILURE into _FINISH_BUDGET and _FINISH_HARD_FAILURE; legacy alias kept as union — clean and backward-compatible.
tools/codecome/harness.py Adds budget-exhaustion auto-resume path, Phase 3 no-op early exit, and gate-failure printing on final failure; logic is sound.
tools/codecome/phase_1.py Mirrors harness.py budget/mid-turn split; removes the legacy not any_step_finish_seen condition in favour of explicit set membership checks.
tools/findings/quality.py New module providing validate_phase2_finding_quality and phase2_summary_declares_no_findings; comprehensive template-marker and placeholder checks.
tools/phases/completion.py Phase 2 gate now validates finding quality and requires explicit no-findings declaration; _resume_opener_for_reason docstring references _FINISH_HARD_FAILURE but code still checks _FINISH_FAILURE.
tools/phases/artifact_checks.py Adds check_phase_2_artifacts wired into the dispatcher; static artifact check validates all PENDING findings regardless of timestamp.
tools/phases/gates.py Phase 3 gate now returns 0 (no-op) instead of 1 when no PENDING findings exist.
tools/codecome/transcript.py Timestamp + PID suffix added to transcript filenames to prevent collisions across concurrent runs.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[run_phase_mode / _run_subphase] --> B{finish_reason}
    B -->|in _FINISH_TERMINAL_OK| C{phase_ok?}
    B -->|in _FINISH_MID_TURN| D[set finish_warning\ncheck completion gate]
    B -->|in _FINISH_BUDGET NEW| E[set finish_warning\ncheck completion gate]
    B -->|in _FINISH_HARD_FAILURE| F[returncode=2\nno retry]
    B -->|step_finish, no reason| G[finish_warning: ambiguous]
    C -->|yes| H[break — success]
    C -->|no| I[returncode=2\nauto-resume if retries remain]
    D --> J{phase_ok?}
    J -->|yes| K[graceful_forgiveness\nfinish_warning=None]
    J -->|no| L[returncode=2\nauto-resume if retries remain]
    E --> M{phase_ok?}
    M -->|yes| N[finish_warning=None\ncontinue normally]
    M -->|no| O[returncode=2\nauto-resume if retries remain]
    L --> P{retries left?}
    O --> P
    I --> P
    P -->|yes| Q[build_phase_resume_prompt\ncontinue loop]
    P -->|no| R[print gate failures\nreturn 2]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[run_phase_mode / _run_subphase] --> B{finish_reason}
    B -->|in _FINISH_TERMINAL_OK| C{phase_ok?}
    B -->|in _FINISH_MID_TURN| D[set finish_warning\ncheck completion gate]
    B -->|in _FINISH_BUDGET NEW| E[set finish_warning\ncheck completion gate]
    B -->|in _FINISH_HARD_FAILURE| F[returncode=2\nno retry]
    B -->|step_finish, no reason| G[finish_warning: ambiguous]
    C -->|yes| H[break — success]
    C -->|no| I[returncode=2\nauto-resume if retries remain]
    D --> J{phase_ok?}
    J -->|yes| K[graceful_forgiveness\nfinish_warning=None]
    J -->|no| L[returncode=2\nauto-resume if retries remain]
    E --> M{phase_ok?}
    M -->|yes| N[finish_warning=None\ncontinue normally]
    M -->|no| O[returncode=2\nauto-resume if retries remain]
    L --> P{retries left?}
    O --> P
    I --> P
    P -->|yes| Q[build_phase_resume_prompt\ncontinue loop]
    P -->|no| R[print gate failures\nreturn 2]
Loading

Reviews (4): Last reviewed commit: "fix: differentiate budget-exhaustion fro..." | Re-trigger Greptile

Comment thread tools/codecome/harness.py Outdated
Comment thread tools/codecome/harness.py Outdated
Comment thread tools/findings/quality.py Outdated
Comment thread tools/phases/completion.py Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens CodeCome’s phase completion diagnostics and artifact gating by preserving full failure details, improving auto-resume prompts, enforcing stricter Phase 2 finding quality requirements, and allowing Phase 3 to succeed as a no-op when there are no PENDING findings.

Changes:

  • Phase 2 completion gating now validates “explicit no findings” summaries and rejects template/stub PENDING findings via new quality checks.
  • Phase harness diagnostics now print remaining gate failures on terminal-stop final failures, and resume prompts are more explicit/authoritative.
  • Phase 3 is allowed to complete as a no-op when no PENDING findings exist, with a generated run summary; transcripts are uniquely timestamped.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tools/phases/gates.py Makes Phase 3 gating succeed as a no-op (warns instead of failing) when there are no PENDING findings.
tools/phases/completion.py Preserves full Phase 2 gate failures; adds “explicit no findings” support and validates Phase 2 finding quality.
tools/phases/artifact_checks.py Implements Phase 2 artifact checks (summary required; findings must be complete unless explicitly none).
tools/findings/quality.py Adds Phase 2 finding-quality validation and “no findings” summary detection helpers.
tools/codecome/transcript.py Adds timestamp+pid into transcript filenames to avoid truncation/collisions.
tools/codecome/harness.py Adds Phase 3 no-op completion path with generated summary; improves terminal-stop diagnostics output.
tests/test_phases_completion.py Expands Phase 2 completion tests for explicit no-findings and incomplete/stub finding rejection.
tests/test_phase_failure_state_reset.py Adds tests for auto-resume on missing artifacts and printing gate failures on final failure.
tests/test_phase_artifacts_cli.py Adds Phase 2 artifact-check test coverage.
tests/test_gate_check.py Updates Phase 3 gate behavior test to assert no-op success output.
tests/test_findings_quality.py Adds unit test coverage for Phase 2 finding quality validation.
tests/test_event_recording.py Updates transcript filename assertion to match new timestamp+pid naming.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tools/codecome/harness.py Outdated
Comment thread tools/codecome/harness.py Outdated
pruiz added 2 commits June 19, 2026 11:45
- Use generator counting in _pending_finding_count()
- Capture datetime.now() once in _write_phase3_noop_summary()
- Remove hardcoded Target path from Phase 3 noop summary
- Make _contains_template_marker() case-insensitive
- Avoid redundant _fresh_run_summaries call in completion gate
- Add test for lowercase template guidance detection
Split _FINISH_FAILURE into _FINISH_BUDGET (length, max_tokens) and
_FINISH_HARD_FAILURE (content-filter, content_filter, error).

Budget-exhaustion reasons now trigger check_phase_graceful_completion
and auto-resume, matching the terminal-stop + missing-artifacts path.
Added regression test for length auto-resume.

Also updated phase_1._run_subphase to mirror the same behaviour.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/codecome/phase_1.py`:
- Around line 590-593: The condition at lines 590-593 now handles both
_FINISH_MID_TURN and _FINISH_BUDGET, but lines 599 and 611 appear to describe
all these cases uniformly as mid-turn cutoffs. Update the code at lines 599 and
611 to differentiate the failure reason based on whether last_finish_reason is
in _FINISH_BUDGET or _FINISH_MID_TURN, so that budget exhaustion is clearly
reported as such rather than mislabeled as a mid-turn cutoff. Use a conditional
check (such as checking if last_finish_reason is in _FINISH_BUDGET) to generate
appropriate distinct messages or handling for each case.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: f91e9b37-8f1a-4178-8a73-5ce2faa765f1

📥 Commits

Reviewing files that changed from the base of the PR and between e2f35c3 and bea6ace.

📒 Files selected for processing (7)
  • tests/test_phase_failure_state_reset.py
  • tools/codecome/harness.py
  • tools/codecome/phase_1.py
  • tools/phases/completion.py
  • tools/rendering/events/__init__.py
  • tools/rendering/events/base.py
  • tools/rendering/events/step_finish.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • tools/phases/completion.py
  • tools/codecome/harness.py

Comment thread tools/codecome/phase_1.py

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.

Comment thread tools/codecome/harness.py
Comment thread tools/phases/completion.py
Comment thread tools/rendering/events/step_finish.py Outdated
…sages

- Auto-resume [budget] now says output budget exhaustion instead of mid-turn
- No-session-ID fallback messages use generic wording for both cases
- Docstring updated: length moved from _FINISH_MID_TURN to _FINISH_BUDGET
- step_finish.py: removed redundant _FINISH_BUDGET check (_FINISH_FAILURE
  already covers it)
@pruiz pruiz merged commit 3855c46 into master Jun 20, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants